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Chapter I 

STATEMENT OP THE PROBLEM 

Traditional methods of curriculum evaluation, based on 
test technology and the use of the experimental method, are 
no longer considered the only or the best methods. Forehand 
(1964) avers that the experimental curriculum cannot be treated 
as a unitary or independent variable, and farther, attempts to 
define a control group that differs only with respect to the 
"central innovative idea" of the experimental curriculum, is 
difficult. Forehand suggests that a number of new strategies 
need to be considered and empirically investigated. 

Heath (1962) is in general agreement with Forehand, but 
perhaps for a different reason. He states that in the evaluation 
of new curricula the evaluator can be legitimately concerned 
with many types of inquiry. But one question, often asked 
and inherently unanswerable, is "Which curriculum, the old or 
new, is better?" Heath's argument is that the usual criterion 
instruments used in a study designed to answer that question 
are usually designed to measure achievement in only one of the 
curricula. One of the curricula wi^.1 surely look superior on 
the basis of such a comparison. Heath's arguments suggest 
the development of new approaches to the evaluation and testing 




of curricula. 






2 

Cronbach (1963) bas suggested the use of Item analysis data 
as one new approach to curriculum evaluation. Rather than to 
administer a usual criterion test composed of the same sample 
of items to a large pool of students , he has suggested that a 
large pool of items be written from which relatively small 
samples may be drawn and administered to different samples of 
students. In the latter instance the curriculum evaluator has 
3 ome information on a wide range of item types from the entire 
pool and by compiling information on item difficulty and item 
correlations could identify those elements of the curriculum 
which need revision. 

Since the advent of the Nebraska University Curriculum 
Development Center (referred to hereafter as the Curriculum 
Center) in 1961 and the subsequent development of a K - 12 
English curriculum;, the Curriculum Center has been involved 
with the evaluation of its product#, The availability of the 
materials of the Curriculum Center and the use of students in 
the try-out population as subjects made it possible to try a 
new approach to curriculum evaluation. The approach set down 
by Cronbach seemed to have merit. 

Purpose 

i- , 

The purpose of this investigation was to explore the 

, * <# 

feasibility of using an item analysis approach in evaluating 
new curricula. This approach presumes that "adequate 11 pool3 of 



items can be established for particular units of the new 
English curriculum at the ninth-grade level. The means of 
comparison will be the contrasting of item data collected by 
use of a "usual" criterion test composed of a limited number 
cf items with item data secured for a relatively large pool of 
items administered a few at a time to each student. 

Review of Literature 

The Curriculum Center has received considerable financial 
support since its inception in 1961. It was initiated by 
monies from the Woods Charitable Fund and The University of 
Nebraska, and later received funds from the V, S. Office of 
Education. It Is one of the curriculum centers in the nation 
which is attempting to set down a series of comprehensive 
courses' of study which will integrate into an overall-structure . 
The Piaget-Bruner views have greatly influenced the structure 
of the new English curriculum at the university of Nebraska. 

The aim of the Nebraska Curriculum center as stated in the 
proposal is that of "creating a systematic program In (English) 
composition . . . (that would) lead the student, step by step, 
to a competent knowledge of prose discourse and a mastery of 
its resources." At this stage the ourriculum is not considered 
to be a final product. The directors of the Center are 
attempting to evaluate their efforts in a variety of ways, and 
through these evaluations modify the curriculum towards a 
relatively final state* 
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One means of evaluation has been the teachers' reactions 
to the materials through their experience with them. These 
reactions and the correlated recommendations have been useful 
guides to revision, but do not constitute a powerful check 
on the efficiency of the program. At this Juncture more 
objective techniques of evaluation have been initiated. For 
example, a longitudinal study of the syntax and content of 
children's compositions (grades 2-6) is now being conducted 
by one of the members of the Curriculum Center. The study 
reported here will be done using selected units of the ninth 
grade curriculum. Others will follow depending on the outcome 
of these feasibility investigations. 

Concurrent to the efforts of the curriculum centers to 
evolve more effective courses of study, and in fact stimulated 
by their need to establish means for effective evaluation of 
their products a loosely-arranged group of persons who might 
be called curriculum evaluators have debated the several alter- 
native routes to the problem. In part, the views of Forehand, 
Heath, and Cronbach have previously been cited. An elaboration 
of their views will follow. 

The major arguments for considering some new approaches 
to curriculum evaluation rather than the usual "experimental" 
curriculum versus the "control" curriculum were that: 

(1) The experimental curriculum cannot be treated as a 
unitary Independent variable! there are too many variables 
within that variable to make this assumption (Forehand, 1964). 
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(2) It is nearly impossible to define a control group 
which differs only with respect to the "central Innovative 
idea" of the experimental curriculum (Forehand, 1964) • 

(3 ) a test designed to determine the effectiveness of 
curricula must of necessity favor (be based on the objectives 
of) either the "old" or the "new" curricula. Therefore, it is 
impossible to answer the question of "which is better" (Heath, 
1962 ). 

(4) These experimental-control comparisons often result 
in average test score differences which are quite small 
relative to the wide differences among and within classes 
taking the same courses (Cronbach, 1963) • 

(5) Only present versions of the curricula are compared; 
extensive efforts to bolster the "inferior" curriculum may 
result in a new "winner" (Cronbach, 1963). 

(6) Attempts to equate the classes associated with the 
different curricular patterns are almost never successful. 

The teacher (s) and group(s) using the "experimental" curriculum 
will typically put forth greater effort thus contaminating 
the results with factors other than the several that are part 
of the new curriculum (Cronbach, 1963; Forehand, 1964). 

The above arguments suggest some shifts in attacking the 
problem of curriculum evaluation, but they do not suggest that 
curricular patterns should no longer be compared. Rather, the 
questions asked should be, "What are the attainments which 
result from the new curriculum?" "What attainments associated 
with old curriculum are not associated with the new curriculum? 



The kinds of questions asked in the latter instance do not depend 
on a single test instrument with but a single total score to 
be used as the criterion measure. These questions suggest a 
more analytical approach to the problem of evaluation involving 
a number of sub-tests and scales with multiple comparisons 

resulting. 

persuing this line of reasoning, some of the new approaches 
suggested in evaluating curricula follow below s 

(1) E lemen ts of the curriculum may be isolated for study. 

One example cited would be that the attitudes of the teachers of 
the new curriculum could be assessed and related to student 
performance. Or the effects of special teacher training program 
on the use of new materials could be related to performance 

(Forehand, 1964; Cronbaeh, 1963). 

(2) Descriptive data of the performances of the samples 

of students taking the new materials should provide valuable 
information for the curriculum developer (Cronbaeh, 1963} 
Forehand, 1964} Walbesser, 1963) . Any number of observations, 
tests, and scales could be employed to determine the effective- 
ness of the elements of a new curriculum. For example, certain 
behaviors could be assessed at several points during the school 
year and the results plotted graphically against the introduction 
of curriculum elements. The use of item data as was suggested 
by Cronbaeh and as will be employed in this study is another 

example# 
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(4) Observations of the interactions of teachers with 
pupils may provide some useful information for the curriculum 
developer (Forehand, 1964). This technique provides a check 

on the teacher* s use of the materials and ideas of the curriculum 
developer, 

(5) Small scale studies of alternative versions of the 
same course may yield much more useful information than field 
trials (Cronbach, 1963)* The point of this method is to 
reduce the numbers of differences in treatments to be controlled 
and to provide feed-back to the developer before the course 
materials become static, 

(6) The development of new test instruments has been 
suggested (Heath, 1962), Heath’s (1964) instrument designed 
to test cognitive preferences and the "Tab Test” of Glaser, 
Damrln, and Gardner (1964) are examples of kinds of instruments 
which differ from the usual evaluation instruments. 

Objectives 

The objectives of this Investigation are listed below, 

(l) Many published and unpublished Instruments purport 
to measure outcomes of English or language art courses. One 
purpose of this study is to examine these Instruments to 
determine which are most useful for measuring the outcomes of 
the new English curriculum, and at what grade levels they are 
most useful. 
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(2) A pool of Items will be developed for each of three 
units of ninth-grade material (Satire, Uses of Language, and 

t> 

Syntax and Rhetoric of the Sentenoe have been selected)* 

(3) Using the pools of items developed as per the second 
objective, the item data for the entire pools administered a 
few at a time to large samples of students will be compared 
with the results of a usual criterion test (a few representative 
items given to a relatively large sample of students)* 
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Chapter II 
PROCEDURES 

Procedure a for the First Objective 

$he primary sources of tests of English were obtained 
from Buros * s Fifth Mental Measurements Yearbook and Testa 
Print* Each of the test companies who advertised English tests 
was contacted and specimen test sets were secured. In addition, 
companies who publish English tests and supplementary materials 
were contacted if there was a chance that their authors may 
have produced sets of items to be used with that particular 

text or for specific units within. 

As the materials arrived two persons with training in the 
new English curriculum as developed by the Curriculum Center 
evaluated the tests relative to their appropriateness for 
assessing the effectiveness of the Nebraska English curriculum. 
Reports were written which included the title of the test, a 
statement of the test format, a description of the areas assessed, 
and a statement of the appropriateness of the test for use with 

the Nebraska English program. 

After the majority of reports were completed the writer, 
the co-directors of the Curriculum Center, and the evaluators 
met to discuss the accuracy of the reports, and to identify 
any particularly promising instruments. A report of the results 
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of those Instruments and Items which are promising and a listing 
of those materials which were not considered useful will he made 
in the results section* 

Procedures for the Second Objective 

The three units selected from the ninth-grade curriculum 
for inclusion in this study were Satire (literature), Uses of 
Language (language), and Syntax and Rhetoric of the Sentence 
(composition)* The pools of items for each unit were constructed 
by a research assistant with training in the Nebraska English 
curriculum and some special instruction in item writing* She 
was Instructed to write items whloh would represent the major 
outcomes of each of the units, and for the items to be written 
as simply as possible* It was hoped that these instructions 
would yield a number of items that would be so cosy for those 
who have had the unit that they could be labeled mastery items* 

After a preliminary list of items was developed by the 
item writer, teachers in the try-out schools for the ninth-grade 
curriculum and personnel in the Curriculum Center reacted to the 
items in terms of their representativeness and their clarity* 

The preliminary list was then rewritten with some deletions and 
additions, and the reactions of these persons were again sought 
until the final list of items for each unit was completed. 

At this point the Items should have been tried out in the 
ninth-grade classrooms in one of the Curriculum Center's 
cooperating schools to gain some additional information as to 
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the clarity and difficulty level of the items. This step was 
omitted because in several instances the cooperating schools 
were teaching or were about to teach the unit under considera- 
tion, and if the study was to be completed this year the items 
must be administered at a given time. It should be obvious 
that item data collected during a try-out phase would not be 
used in the typical manner, but would have served only as a 
check on item ambiguity and clarity. However, some extremely 
difficult items may have been revised had this precaution been 
possible. 

The procedural phase of the second objective resulted in 
eighty-four multiple choice items for the unit on Syntax and 
Rhetoric of the Sentence, eighty-three items for the unit on 
Uses of language, and sixty-four items for the unit on Satire 
with a total pool of 231 items, 

* 

Procedures for the Third Objective 

During the time that the pools of items were being 
developed, the schools cooperating with the Curriculum Center 
were contacted to determine the extent of their participation 
in the Nebraska English program. Those schools which were 
Involved in ninth-grade programs were asked to participate 
in this study by sending a schedule of the units they planned 
to teach during the school year. The goal was four random 
samples of one-hundred students each for each unit. Some of 
the students served as subjects for all three units when their 
unit schedule was favorable. 
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Of the four random samples, two were tested prior to 
being exposed to the units and two were tested after they were 
taught the unit. One of the samples (A) responded to the entire 
pool of Items by taking ten to fifteen Items selected at random 
for each student before exposure to the unit. The second sample 
(B) responded to the 3 ame pool of Items administered In the same 
manner after exposure to the unit. The third sample (C) responded 
to a small number of items selected from the pool of Items to be 
representative of the unit (the usual criterion test approach) 
before exposure to the unit. Finally, the fourth sample (D) 
responded to tho same Identical Items as the C group only after 
exposure to the unit* This general process of data collection 
was followed for each of the units with tho following exceptions. 

For the unit or Satire It was necessary to use eighth-grade 
students for the pre-test data (groups A and C). By the time 
the test Items on Satire were In finished form all the ninth- 
grade students who were Involved In the new English curriculum 
had been exposed to the unit. As this occurred toward the end 
of the school year the writer can find little problem In this 
modification. The eighth-grade students selected were deliberately 
taken from the same schools that were involved In the post-testing 
situation. 

Table 1 shows the schools Involved In the study, the 

number of students In each condition from which the final 100 

8 

subjects were drawn at random, and the condition to which each 
school was assigned « Several differences in the research 
methodology need to be discussed at this point. 
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The administration began with the items on the unit of 
Syntax and Rhetoric of the Sentence, but the completion of the 
administration at that point was impossible because of the 
different schedules for teaching the units. In the earlier 
stages of collecting the data the writer tried to re -administer 
the items in the post-test position to those schools ?*ho partici- 
pated in the pre-testing. Thus Arbor Heights Junior High in 
Omaha is in both test positions under the heading Criterion 
Test on the Table. The same is true of York Junior High under 
tha heading Sampling of Item Pool on the Table. It would seem 
that comparison of gains by the same students as compared with 
gains by two different samples of students would prove to be of 
interest. These data will be reported In the Chapter on results. 

It should also be noted that there are only 67 students 
instead of 100 or more in the Post-Test position under the 
heading Criterion Test for the unit on Syntax and Rhetoric of 
the Sentence. One teacher which had planned to teach thw unit 
toward the end of the school year simply ran out of time; it 
was not possible at that point to make arrangements for a 
replacement school. 

Finally, since the point of the comparisons was to be 
between the usual criterion test technique and the sampling of 
item pool method, for the units of Satire and Uses of language 
an effort was made to administer approximately half the class 
the criterion test and the other half the sampling of item pool 
tests. This technique should result in less variation among the 
subjects for the comparisons of import. Thus in Table 1 one 
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TABLE 1 

THE SAMPLING SCHEME BY UNIT 
AND TESTING POSITION 



SATIRE 



Sampling of I tent Pool 



Criterion Teat 



Pre-Teat (a) 
Everett 8th, Lincoln 


N 


Pre-Teat (G) 


N 


76 


Everett &th, Lincoln 




Irving 8th, Lincoln 


50 

125 


Irving 8th, Lincoln 


T& 


Post-Teat (B) 


N 


Poat-Teat (D) 
Everett, Lincoln 


N 


Everett, Lincoln 


f2 


9 T 


Grand Island 


23 


Grand Island 


29 


Irving, Lincoln 


55 

T5o 


Irving, Lincoln 


49 

175 



SYNTAX AND RHETORIC OF THE SENTENCE 
Sampling of Item Pool Criterion Teat 



Pre-Test (A) 
Irving, Lincoln 
York 



Post-Teat (B) 
Everett, Lincoln 



York 



Repeat Teat 



N 

55 

2 




N 

133 

N 

52 



Pre-Te st (C) 

Ws, Omaha 



Arbor ffe: 



Post-Teat (Tj) 
Everett, Lincoln 

Repeat Teat 
Arbor Heights, Omaha 



N 

155 

155 

N 

57 

N 

153 



USES OF LANGUAGE 



Sampling off Item Pool 



Pre-Teat (A) 
George HTorria, Omaha 


N 

26 


Grand Island 


78 

TOT 


Post-Teat It) 


N 


Everett, Lincoln * 


8b 


Irving, Lincoln 


49 


York 


?3 

mg 



Criterion Teat 
Pre-Teat (C) N 



George Norria, Omaha 


3i 


Grand Island 


7 1 
IBS' 


Poat-Teat (D) 
Everett, Lincoln 


N 


B5 


Irving, Lincoln 


53 


York 


30 

IBB 1 
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will note that the names of the same schools appear in the A 
and C positions and also in the B and D positions of the Table 
of those two units (Satire and Uses of Language)* 

The statistical analysis of the results will be the item 
difficulties for each sample, the item difficulties for the 
two samples of each testing method combined, the phi coefficient 
for the two samples of each testing method, and finally, test 
data including "t" test comparisons of the pre- and post-test 
groups who took the criterion test • 



Chapter III 
RESULTS 

The results reported herein will he presented In the sane 
sequence as the objectives. 

Evaluation of Available English Test Instruments 

English tests were evaluated relative to their usefulness 
for assessing the outcomes of the Nebraska English program. 

The purpose of this phase of the study was to locate tests 
which would be especially useful for the new curriculum at the 
ninth-grade level. However/ test materials at other grade-levels 
were also secured and evaluated. 

Tests of Value 

Following are rather brief summaries of those tests which 
seemed to hold some promise for evaluating the new curriculum* 

1. Tests for Adventures In Reading . Chicago t Har court. 
Brace, & World, Inc., 1963* 

Tests are based on selections In Adventures In Reading, 
Laureate Edition. Most of the Items and units test one ( s 
ability to memorize factual knowledge. The section that seems 
to have value In assessing the new English curriculum is 
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entitled, "The Epie Tale*" The Odyssey is the tale on which 
the assessment is based* There are two sections or kinds of 
tests used* 

a) ... The first section is entitled, "Understanding 
the selection*" The items in this section are four response, 
multiple-choice items which are based on recall of Incidents, 
names, and places* Following is an example: 

"When Odysseus and his men entered the Cyclops 1 
cave, the?jr found that 

a* he was in great pain 
b* a feast was ready in their honor 
c* a message awaited them 
d. he was not there." p* 95 

b) The second section is entitled, "Appreciating 
characterization in the epic." The same kind of items are 
used to test understanding of the character* Following is an 
example: 

"Which description does not fit Odysseus? He is 
a* considerate but determined 

b. vain and inclined to boast 

c. self-reliant and inventive 

d. meek and quiet*" p* 98 

Both sections contain items that are useful in testing the 
Epic Unit as taught to ninth-grade students in the new English 



curriculum* 
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2* Dfllman-Clark Teat on Classical References and Allusions* 
Iowa City: Bureau of Educational Reaearoh and Service , 1928 . 

Some of the itema are directed to the Odyssey and would 
therefore he appropriate to uae for the E v > Unit at the ninth- 
grade level* The Itema are five response, multiple-choice style 
and are baaed on highly factual information* 

3. iSSi S£SM SL Educational Development * Chicago : 

Science Research Assoc la tea, I960* 

A general teat of educational change presented In multiple- 
choice format* Three sections pertain to English* The first 
appropriate section Is entitled, “Correctness and Appropriateness 
of Expression, 11 and has as its purpose the assessment of one*s 
ability to Identify poor writing, inappropriate choice of words, 
faulty sentence structure, and careless organisation* The 
spelling portion of this unit is not appropriate for the Nebraska 
English curriculum, but the remainder Is* 

The second section on English is, "Ability to Interpret 
Literary Materials*" It consists of items over two selections 
of poetry and seven selections of prose* Although the skills 
assessed by this unit are not related to a specific unit the 
objectives underlying the test are clearly congruent with the 
Nebraska English curriculum* 

The final section on English Is, "General Vocabulary*" 

Since the words are given out of context this kind of vocabulary 
test would be meaningless for the Nebraska English curriculum* 
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4. Step Communication^ Skills Tests (Reading, Writing, 
and Listening)* Princeton: Educational Testing Service, 1956. 

There are three sections, one to cover each of the basic 
communication skills of reading, writing, and listening. The 
items are multiple-choice • 

Of these sections the one on writing is appropriate for 
the Unit on Syntax and Rhetoric of the Sentence at the ninth- 
grade level of the Nebraska English curriculum. In this test 
the student is required to choose correct sentences for the 
contexts of given paragraphs. There Is emphasis on clarity and 
revisions. The students are required to find flaws in given 
paragraphs. The test is well done, and can be purchased 
separately from the remainder of the battery. 

Tests of Little Value 

The majority of the tests evaluated are of little value 
in assessing the effectiveness of the Nebraska English curriculum. 
The major differences between the tests and the curriculum is 
the dependence of the tests on rote memory items based on 
traditional English. Following is a list of those tests which 
were evaluated, but are not considered useful: 

X m objective Test in Constructive English. Logan, Iowa : 

Tbe Perfection Form Co., 1964. 

2. Cooperative Inter-American Teats . Princeton, New Jersey 
Cooperative Test Division, Educational Testing Service, 1950. 
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3. National Achievement Test , Rockville Centre, New Yorks 
Acorn Publishing 0o., 1939# 

4, English end American Anthology Tests, Logan, Iowa : 

The Perfection Form Co., 1959. 

5, The Perfection Book Review Testa, Logan, Iowa: The 
Perfection Form Co. 

6. Objective Testa in English. Logan, Iowa s The 
Perfection Form Co., 1965. 

7* Ohio Scholarship Testa (Every Pupil) . Columbus, 

Ohio: State Department of Education. 

8. Manchester Semester-End Achievement Tests : Ninth 
Year English Fundamentals. North Manchester, Indiana : Bureau 
Of Tests d 

9. Outside Reading Teat . Portland,' Maine: J. Weston 

Walch, Publisher, 1964. 

10. Step Essay Test . Princeton, New Jersey: Cooperative 

Test Division, Educational Testing Service, 1957* 

11. Rigg Poetry Judgement Test . Iowa City, Iowa : Bureau 

of Educational Research & Service, Extension Division, University 
of Iowa, 1942. 

12. Cooperative English Tests . Pr&oceton, New Jersey: 
Cooperative Testing Service, Educational Testing Service, I960. 

13. £n Awareness Test in 20th Century Literature . Atlanta, 
Georgia: fumev E. Smith & Co., 1937. 

14. Every Pupil Scholarship Test. Emporia, Kansas: 

Bureau of Educational Measurements, Kansas State Teachers 
College, 1964. 
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15* Novelty Grammar Teats . Portland, Maine: J. Weston 

Walch, Publisher, 1961. t , 

16. Iowa Testa of Basle Skills ♦ Boston, Massachusetts: 

Houghton Mifflin Company, 1964* 

17. SSlSS Diagnostic-Accomplishment Tests In English* 

Ridge Manor, Florida: Ridge Manor Publishing Company, 1966* 

18 . S£H Eaton Literature Tests , Ridge Manor, 

Florida Publishing Company. 

19# Tress ler Ei^lis^ Minimum Essentials Test. ’ Indianapolis, 
Indiana: The Bobbs-Merrill Company, Inc., 1941. 

20. SEA Achievement Series, Multilevel Edition Fom D. 

Chicago, Illinois: Science Research Associates, Inc., 1963* 

21. SRA Achievement Series, Reading 1-2 Form D and Grades 

2-4 Form D. Chicago, Illinois: Science Research Associates, 

1964. 

22* Survey of Language Achievement , California, Survey 
Series, Junior High Level . Los Angeles, California: Calfornia 

Test Bureau, 1957* 

23. English Tests for Outside Reading . Toulon, Illinois: 

Published by Henrietta Silliman, 1939. 

24. Essentials of English Tests . Minneapolis, Minnesota: 

mmmmm 

American Guidance Service, Inc., 1961. 

25. Man cheater Unit Elementary Testa_ (3rd, 4th, 5th, & 

6th gradea). North Mancheater, Indiana: Bureau of Testa. 

26. Mancheater Unit Elementary Teats (7th (fc 8th grades) . 
North Manchester, Indiana: Bureau of Teats. 



22 



27* Manchester Semester-End Achievement Teats, (9th grade) . 
North Manchester > Indiana: Bureau of Testa. 

28. Rlnsland-Beck Natural Teat of English Usage . 
Indianapolis, Indiana: The Bobbs-Merril! Company, Inc., 1958. 

f 

Fear teats are available which are appropriate for testing 
outcomes of the Nebraska English curriculum. As a consequence, 
of this finding that part of the study in which an appropriate 
test was to be administered at the end of the year to all 
members of the samples was dropped. The idea was to analyze 
the Items on the basis of these results. The analysis presumes 
an adequate system of evaluation. 

Evaluation of English Test Instruments and 
Test Items Made Available by Book Publishers 

. 

After securing and evaluating a number of English tests 
with the expectation that several would not be of value, but 
with the hope that a greater number than was would be of use 
in evaluating the Nebraska English curriculum, a number of 
English books and their accompanying teacher's manuals were 
acquired. Only those materials whloh showed some promise of 
yielding some well-written Items which would be applicable 
to the new English curriculum were ordered. In the section 
Immediately following those materials which appear to be of 
value are reported. 
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Materials Applicable to the New English Curriculum 

1. Noble and Noble Comparative Classics. New York s 
Noble and Noble, 1965 . 

I tems are based on a series of books on literature. The 
items are a variety of types such as true-false, multiple-choice, 
completion, and essay. They assess both factual information 
and understanding, and penetrate the central ideas of the works. 
The major drawback in use of these items is that the literature 
presented in this series does not correspond with the literature 
in the new English curriculum. The scope of this series is 
n mc* narrower and the items based cn it may be useful in stimu- 
lating further quality items based on other literature. 

2, Roberts, Paul. The Roberta English Series. New Yorks 

Harcourt, Brace, and World, 1966, 

The items are written in direct correlation to the text, 

and are primarily multiple-choice and completion. They test 
application of knowledge and factual information as related 
to literature. The main feature of this series is that the 
syntactical approach to grammar is much like that of the 
Nebraska English curriculum. Attempts are made to correlate 
the study of literature and the learning of grammar, flood 
literature is used. Some original composition is done. The 
test items are not of great value for the Nebraska English 
curriculum because of their close correlation to the series. 
They may provide some useful examples, if not actual items 
for use with the Nebraska English curriculum. 



3. Postman, Neil at al. Exploring Tour language . 

New Yorks Holt, Hinehsrt, end I Winston, Ino., 1966* 

Multlple-oholee, collet ion, and original composition 
Items are included In this texi\ The purpose of these items 
is to provide practice for the students in applying what he 
has learned from the text« The book, and the items, emphasise 
many of the things that are in the Nebraska English Curriculum* 
There is a formlstlc approach to grammar* Tone, purpose, and 
structure are studied intensively with an eye to improving 
the student ®.s writing style* Many of the praotioe items could 
be used in the classrooms using the Nebraska English eurriculum 

even though they oover s narrower range of material* 

. . 

Materials of Little Value to the Nebraska English Curriculum 
Below is a liat of those texts or series which were 
scrutinised for useable it eas, hut were found wanting! 

1. Bailey, Matilda at el. Our Bnillsh Language, Third 
Edition. New York: American Book Co., 1963. 

2. Bracken, Dorothy Kendall; Ruth Marie Mo scrip; and 
Nonna dlllett Rahder. Building Better English. Evanston, 
Illinois > Row, Peterson, and Co., 1962. 

3. Burrows, Alvina Trent et al. American English. 

New York! Holt, Rinehart, and WinatoH, 1962. 

4. Dawaon, Mildred A. et si. Language for Dally Has . 

New York; Hkroourt, Brace, and World, 1959. 

5 . Bay, Leo C. Curriculum Motivation Series. Chicago: 
Lyons and Carnahan, 1963. 











6. globe Book Company Series # New York? Olobe Book 
Company, 1963 - 66 , 

7. Hall, C. Held eft al, grim Blementary English, Boston: 
ginn, 1963* 

8. McKee, Paul et al. language fog Meaning Series , 
Cambridge: Houghton Mifflin Co., 1956. 

9. Monroe, Marion; Ralph 0. Nichole; W. Cabell greet, 
and Helen M. Robinaon. learn to Listen. Speak, and Write. 
Chicago: Scott, Foreman, and Company, 1964. 

10. Pollaeh, Thomas Clark et al. The Macmillan English 

* 

Series . New York: Macmillan and Company, 1963. 

11. Shane, Harold 0.; Mary York; Florence K. Ferris; 

Edward E. Kemer. Using flood English. River Forest, Illinois: 
Laidlow Brothers, 1961. 

12. Stegner, Wallace E. et al . Modern Composition . 

New York: Holt, Rinehart, and Winston, 1963. 

13. Wolfe, Don M. et al. Enjoying Engllah . Syracuse, 

New York: The L. W. Singer Co., 1965. 

14. Wolfe, Josephine B. English Your Language . Boston: 

Allyn and Bacon, 1963. 

;<e suits of Item Writing Phase 

The items which were finally selected for inclusion in 
the tests on Satire, Uses of language, and Syntax and Rhetoric 
of the Sentence are presented in Appendices A, B, and C. As 
was noted earlier, some extremely difficult items are included 
in the tests. This may have been due to the omission of a 
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try-out stage, tut wore likely was due to the conceptions of 
the item writer and the Investigator as to the proper procedure 
for selecting 75 to 100 items for a unit. If one is to 
properly sample an area with a relatively small number of items 
then it follows that the majority of items written on the major 
objectives of the unit will snore probably require application 
types of items rather than the more factually oriented problems. 
As one moves away from the fact and knowledge type of question 
to the application type of question, the task of writing items 
which are easy and which may discriminate becomes more difficult. 

To have sampled minor objectives (details, facta, and some 
secondary principles) would have surely allowed an even greater 
element of chance to operate in that the results would have 
depended in part on whether or not a particular teacher properly 
emphasized that detail, or perhaps whether she even had time to 
teach such detail to classes of lower ability. Ultimately this 
more detailed kind of information is crucial in developing a 
sound curriculum. Investigations of greater scope and duration 
would not be faced with the restrictions of this investigation, 
and the investigator could develop a variety of items which 
varied in scope (knowledge to application) and difficulty 
(mastery to highly discriminating) • 

Results and Discussion of Item end Test Analyses 

The third objective was to compare the effectiveness of 
the Sampling of an Item Pool approach, which will t* referred 
to hereafter as the Item Pool (IP) approach, with Shat of the 
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Criterion Test (CT) approach. The main purpose of the study 
was to test the hypothesis that the IP approach would prove 
to be a more useful means of judging the effectiveness of 
curricula* It is assumed that the multivariate objectives of 
curricula are best assessed when the curriculum builder has 
several items representing a variable rather than only one or 
so as in the case of the usual Criterion Test approach. 

Results on Syntax and Rhetoric of the Sentence 

The results of the intern anaylses on Syntax and Rhetoric 
of the Sentence (hereafter known as Syntax) are presented on 
Tables 2 through 7. Several comments are in order about the 
data associated with this unit, but which also apply to the 
data of the two units to follow. 

First, the N*s for the item data of the IP and CT 
approaches are not the same. The CT approach is always based 
on 100 cases in each of the pre- and post-test positions 
(except in instances where some students did not respond to 
a given item) whereas the IP approach la based on approximately 
lf> cases each in the pre- and post-test positions with 
variations as low as seven and as high as twenty. This latter 
fact is the result of administering only a few items to each 
of 100 students in the pre- (or post-) test positions. Thus 
using the IP approach results in less information per item, 
but some information about a greater number of items when 
using the same sample size as the CT approach. Additionally, 
the fact of less information per item complicates the comparison 
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ITEM ANALYSIS LATA ON 
REVISION IN WRITING 

OF THE SYNTAX AND RHETORIC OF THE SENTENCE UNIT 
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* Denotes all Items In the pool 
** Denotes only those Items on the criterion test 
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with the CT approach* — random assignment of a small number of 
students to Items most certainly may yield more variable results 
than results based on the entire sample. The fact of variability 
Is quite evident In %uose cases In which the same item Is 
employed in both the IP and CT approaches. 

Second, it Is noted that there are a considerable number 
of negative phi coefficients reported. These phenomena may 
be explained. In part, by the fact of small 11*8 which result In 
greater variability of results, negative as well as positive 
(there tends to be a larger proportion of negative phi 9 s asso- 
ciated with IP than with CT). Beyond this explanation Is the 
possibility that the curriculum, as developed and explained to 
the teachers. Is the major factor, and that negative phi 
coefficients constitute evidence which suggests the curriculum 
is not operating effectively. In fact, a negative phi coefficient 
may indicate evidence that those persons who have not been 
exposed to the curriculum perform more effectively on that item 
than those who have been exposed to that particular unit of the 
curriculum. 

Cf course, at least two other explanations are possible. 

One is that the students responding to these Items in the post- 
test position were better students than those in the pre-test 
position henoe a negative phi • The other Is that the Item was 
written in 8 manner such that a little information or knowledge 
was "a dangerous thing,** and served only to confuse the respondents 
as they attempt to solve the Item. 
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Third* some of tht rather respectable and positive phi 
coefficients value-wise may he the result of these same factors 
just mentioned in the case of the negative phi only working 
in reverse. 

The remainder of the comments to he made within this section 
are specific to the unit on Syntax. It will he recalled that 
the Syntax unit was administered to the same students* in some 
instances* in both the pre- and post— test positions. The results 
of these tests are reported in Tables 2 to 7 under the columns 
entitled Repeat Item Pool and Repeat Criterion Test. Items 
administered under these conditions should result in more 
positive phi coefficients and higher difficulty percents because 
of the beneficial effects of having taken the same items twlc? » 
Inspection of the medians in Tables 2 and 7 show© this to he 
true except for the Tables 2 and 7 in which repeat IP data is 
more difficult than regular IP data. Therefore the subject may 
learn to pay attention to those aspects of the course which are 
suggested by the tost items. 

The exception to this statement would again be the Repeat 
Item Pool approach in which students are randomly selected to 
take a small sample of items of the total pool. Here* different 
students may be responding to a given item in the Repeat IP 
approach while in the Repeat Criterion Test approach the same 

students are responding to the same items twice. 

One other comment seems in order. The students who responded 

in the post-test position in both non-repeat approaches tended 
to be less capable than those students who responded in the 
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pre-teat position. This admission renders the analysis less 
useful for the curriculum builder than was planned. 

Each unit is separated into topics, and wocparisona of 
the two approaches are made within topics. The topic of Repeti- 
tion within Table 3 affords an interesting comparison of the 
two approaches. First, looking at the Repeat Criterion Test 
data ®ne could conclude that the curriculum could bear revision, 
but some learning did occur. Second, the Repeat Item Pool data 
suggest a rather good job of conveying information on items 75, 

77# ?8, 8l, 82, and 87# but indicates a lack of learning on the 
remaining items. The performance of these subjects on the items 
used on the Criterion Test (85# 86, and 87} are markedly different 
and in the negative direction except for 87 which is positive 
and relatively high. 

Third, the Criterion Test data using the two separate 
samples of students in the pre- and post-test positions are 
quite different and even more discouraging than the Repeat 
Criterion Test data. All phi coefficients are negative, and 
the items appear to be somewhat more difficult than the Repeat 
Criterion Test data presenting a rather bleak picture of 
curricular effectiveness. 

Fourth, inspecting the Item Pool data one could conclude 
that the curriculum was generally ineffective in modifying 
student behavior and that the items were too difficult. However, 
four or five of the items showed the effects of some learning. 

The picture is somewhat brighter using Item Pool data as 
compared to the Criterion Test approach. 
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On Table 4, item number 7 under the Repeat Criterion Teat 
column, one notices a negative phi coefficient of -.141. In 
this case In which the same 100 students took the same test 
twice it is safe to say that the teacher's conception (and 
maybe the conception as presented In the curriculum materials) 

Is at odds with that of the Item writer's. 

In summary of this section It should be kept in mind that 
the Item Pool approach resulted In more positive data for some 
topics, but for others, more negative data. The determination 
of which of the approaches is most accurate needs little 
discussion since the effects of the N and random items assign* 
ment surely suggest the Criterion Teat as superior. The 
remaining question then is whether the negative or positive 
results associated with the IP approach are due to the additional 
Information garnered by use of additional Items or whether such 
results are due to chance. The writer would prefer to believe 
that the additional information gained from the use of additional 
items was the proper answer, but chance cannot be ruled out. 

The information gathered by repeat testing of the same 
students in both pre- and post-test positions seems to be of 
some benefit in this kind of analysis. These results afford a 
check against the data gathered by using two different populations 
in the pre- and post-test positions. The Information is fairly 
easily secured, and If the test Is administered under controlled 
conditions the first time, the tendency to teach to the test may 
be restricted. Certainly, If an Item is not positive then one 
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could be relatively certain he Identified a conflict In the 
Interpretation of the curriculum between the Item writer and 
the teachers • The approach seems worthy of further consideration 
In future studies of this kind* 

Results on Uses of language 

The test Items were not re -administered to any group for 
this unit resulting In the elimination of those columns entitled 

t 

Repeat Item Pool and Repeat Criterion Test* 

Considering only the median for each topic within this 
unit It can be seen that there is little difference between 
the two approaches for Tables 8 to 12* 

Looking at the distribution of Item data within those same 
Tables yields a somewhat different picture* More negative phi's 
are found under the Item Pool data as compared to the Criterion 
Test data. Even those Items used on the CT approach may more 
often show as negative on the IP approach (Table 9) which lends 
some strength to the argument of varied results as a result of 
small N and random assignment of Items to students In both /pre- 
and post-test positions* This kind of variable data certainly 
Indicates a need for large N's, but perhaps the argument for use 
of the same population and repeat testing also receives some 
support If the goal Is curriculum revision* In repeat testing* 

c 

the fact that Inflation of Item data occurs has already been 
demonstrated* but if xtem performances are considered in a 



